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AN EMPIRICAL EVALUATION OF FACTOR RELIABILITY 



Douglas N. Jackson Martin E. Morf 

University of Western Ontario University of Windsor 

Abstract 

The psychometric reliability of a factor, defined as its generaliE- 
ability across samples drawn from the same population of tests, is con- 
sidered as a necessary precondition for the scientific meaningfulness of 
factor analytic results * A solution to the problem of generalizability is 
illustrated empirically on data from a set of tests designed to measure 
facets of response styles and of personality dimensions. Parallel sets of 
measures based on personality scales defining each of seven factors were 
separately factored. Independent sets of component scores derived from the 
orthogonal least squares fit to the oblique factor pattern matrix were com- 
puted, and these component scores were intereorrelated between the two sets, 
yielding factor reliabilities, whose values ranged from ,65 to .85 < .0001, 

for each factor) , A corresponding analysis based on scores derived from 
random binary data yielded nonsignificant factor reliabilities ranging from 
“.12 to +.07. It was recommended that such a test of factor generalizahillty 
be incorporated routinely into factor analytic investigations, particularly 
those employing Procrustes-type rotations. 



AN EMPIRICAL EVALUATION OF FACTOR RELIABILITY 1 
2 

Douglas N, Jackson Martin E. Morf 

University of Western Ontario University of Windsor 

The present study has two major aims: (a) to propose a method of 

estimating the psychometric reliability or generalizahility of a set of fac- 
tors i and (b) to apply this method to a set of empirical data whose reli- 
ability has been questioned in the literature. 

The simplest and least controversial , but also the least informative , 
definition of a factor is: "a set of loadings, 1 ' Such sets of loadings are 

obtained from correlation matrices by procedures like the principal factor 
method (Harman, 1967)* If unities are inserted in the main diagonal of the 
correlation matrix, this procedure yields a mathematically elegant and unique 
solution* However, some or all of the factors obtained may reflect pseudo- 
relationships based entirely on chance, while others may account for real 
relationships in psychologically nonmeaningful ways. Such preliminary 
factor solutions thus raise two problems: (1) Which factors reflect true 

common variance as distinguished from error variance? (2) If a set of fac- 
tors does not merely reflect chance relationships, how may axes be rotated 
to yield psychologically meaningful factors? 

A variety of characteristics permit one to make inferences about the 
significance of a factor; among these are the size of its eigenvalue, the 
standard errors of its loadings, and its contribution to the communal! ties 
of the variables (Cliff & Hamburger, 1967). Since the principal factor method 
■extracts factors in order of size (as reflected by their eigenvalues) , the 
first question can be rephrased to: How many factors should be retained for 
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rotation and interpretation? The question of which, or how many, factors to 
retain has been approached in a number of ways. First, rules of thumb have 
been applied (e*g., Kaiser, 1960). Second, mathematically derived statistics 
reflecting the significance of factors have been developed (Joreskog, 1967, 
1969; Lawley, 1943; Rao, 1955). Third, real data and random data have 
been factor analyzed together and only the factors based on real data with 
eigenvalues greater than the largest eigenvalue of a random factor retained 
(Horn , 1965 ) , 

None of these methods provide unequivocal answers. The answers are 
contingent on the assumptions of the method that underlies them; on whether 
it is the subjects or the variables that are treated as a population, and 
upon which of several properties of factors the emphasis is placed. While 
preliminary, unrotated factors obtained by some extraction method may re- 
flect a pseudo-meaningful structure attributable to chance, rotated factors 
are even more likely to do so. Rotation has been identified as occupying a 

critical role in the possible capitalization upon chance implicit in the 

3 

emergence of a pseudo-meaningful structure based on random data. This is 
especially true when the method of rotation is "procrustean" and when the 
constraint of orthogonality does not interfere with the maximization of load” 
ings in accordance with the investigator's theory (Horn, 1967; Humphreys, 
Ilgen, McGrath, & Montanelli, 1969). As a result, the factor analytic 
investigator is faced with the dilemma of blind rotation providing mathe- 
matically satisfactory but not necessarily psychologically meaningful solu- 
tions (Guilford & Hoepfner, 1969; Saunders, 1960) versus rotation that in- 
creases the possibility of a pseudo-meaningful structure. 
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Some ways around this dilemma are suggested by examination of the 
criteria to be met by satisfactory factor solutions, Guilford and Hoepfner 
(1969) , for example, suggest that the factors obtained should be amenable to 
investigation by nonfactor-analytical means, should fit relevant psychological 
theory, and should be replicable, Kaiser and Oaf fray (1965) have elaborated 
the notion of factor replicability by distinguishing between statistical rep- 
lication across samples of observations and psychometric replication across 
samples of variables. Other properties of satisfactory solutions are low 
standard errors of the loadings and small deviation of the means of the 
sampling distributions of loadings from the population parameters (Cliff & 
Pennell, 1967; Pennell, 1968). 

Focus on the properties of satisfactory solutions has led to the investi- 
gation of the effects on them of various independent variables by means of 
Monte Carlo simulation studies and to the assessment of factor invariance 
and other properties of factor solutions in specific analyses of real data. 

A number of useful rules of thumb, helpful in preventing the emergence of 
pseudo -me a n ing f u.'L structure, have emerged from the first of these two cate- 
gories of studies i Horn (1967) and Humphreys et al » (1969) factor analysed 
randomly generated data and obtained results which suggest that the number 
of observations and the ratio of variables to factors should be higher, and 
the ratio of variables to subjects lower, than they are in most studies. 

Cliff and Pennell (1967) and Pennell (1968) constructed population factor 
matrices and generated large numbers of sample correlation matrices implied 
by them* They extracted preliminary factors from the correlation matrices 
and rotated them to the best least-squares fit with the original population 
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factor matrix (Cliff, 1966), Their results indicate that the larger sample 
size, communality , and factor size, the greater the consistency, and the 
smaller the bias, of the loadings. 

Given the rules of thumb which follow from studies like these, it is 
possible to design reasonably sound factor analytical studies. The specific 
adequacy of each individual study, however, requires separate examination. 

The present study focuses on this problem. The adequacy of a specific fac= 
tor analytical solution obtained in an earlier study (Morf 6 Jackson, in press) 
and subjected to some intuitive criticism by Block (1971) regarding its 
supposed chance basis, is tested by examining the invariance of the factors 
across two parallel subsets of measures included in the original battery. 

The psychometric reliability of a factor solution is but one aspect 
that could be investigated. Factor invariance or replicability over sam- 
ples of tests is, however, a necessary condition for drawing generalizahle 
conclusions regarding results. Replicability has frequently been described 
as the ’’minimum requirement of science,” Demonstrating factor replicability, 
in the psehometric sense, is tantamount to demonstrating that chance alone 
does not account for the results* 



Method 



The basic data for this study have been published by Morf and Jackson 
(in press). Since that report fully described procedures for data collection and 

substantive interpretation of primary and second-order factors, these issues 

* 

will not 'be highlighted here. Briefly, the study was designed to elicit re- 
sponses from 196 liberal arts undergraduates, 87 males and 109 females, to a 
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personality questionnaire of 560 items comprising 49 nonoverlapping scales 
relevant to eight factors, four attributable to content (Exhibition, Play, 
Succorance , and Understanding) , and four attributable to response styles 
(True Responding, Item Endorsement, Desirability, and Adjective Endorsement) * 
A facet design was employed, in which each substantive personality scale was 
designed to load one content factor and at least one response style factor. 
Reference tests for response style factors were also Included. The results 
from the Morf and Jackson principal axis analysis and rotation to a clustran 
(Bentler , 1971) criterion yielded unusually clear support for their hypothec 
sized factors, with virtually all tests appropriately and substantially load- 
ing the hypothesized dimensions. The findings did not, however, convince 
Block (1971), who, citing Horn (1967) and Humphreys et al . (1969), attribu- 
ted them to chance. Even though the Morf and Jackson study more than met 
the Humphreys recommendations of at least four tests defining a fac- 
tor (indeed, there were more than 30 defining each of the major acquiescence 
dimensions), the fact that, as far as the authors were aware, no satisfactory 
test of psychometric reliability had been reported provided an impetus for 
the present investigation. 



Method of Analysis 

Each of the two sets of scales was separately and independently factored, 
subjected to an independent analytic patterned rotation, an orthogonal Pro- 
crustes rotation, and two separate matrices of component scores computed. 

These two sets of component scores were then intercorrelated . The correla- 
tions between corresponding component scores for a given factor could then be 
evaluated for statistical significance and for reliability. 




7 



- 6 - 



The first step in the analytic treatment was to divide the set of vari- 
ables into two sets. This was done in such a way so as to place an equal 
number of tests hypothesized to reflect each factor into each set. All test 
scores were centered at the mean of the test and scaled to have a unit 
standard deviation. These two matrices of standardised scores were inter= 
correlated separately within each battery, unities retained in the diagonal , 
andj because there were seven hypothesized factors , seven principal com- 
ponents factors were extracted from each battery. Because the Adjective 
Endorsement factor had been defined by only three variables which differed in 
desirability level s it was not possible to obtain parallel sets of variables 
to define it. Hence the scales originally defining this factor were dropped 
from the analysis, as were two additional variables not loading highly on 
any factor, Infrequency and Sex. 

The basic procedure for the orthogonal Procrustes solution was analyt- 
ically to place axes at the centroid of the respective hypothesized salient 
test vectors and then to find the orthogonal rotation fitting this oblique 
solution in a least squares sense, A procedure developed by Horst (1965, 
pp . 394-397) was employed to transform each of the principal axis factor 
matrices separately into alignment with their respective hypothesized pat- 
terns. The transposed principal axis factor loading matrix Is pos tmultiplied 
by a binary hypothesis matrix. The matrix product is premultiplied by the 
reciprocal of the eigenvalues associated with the largest principal compon- 
ents, and the resulting matrix normalized by columns. This matrix thus serves 
as an oblique transformation matrix, h, which is used to postmultiply the 
initial principal axis factor loading matrix, A s 





to yield a primary component 
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pattern matrix, b (cf. Kaiser, 1962), representing loadings of tests on 
oblique axes. 

b = Ah . (1) 

A proof of the rationale on which this method is based is provided by * 
Horst (1965, pp. 411-412). 

Bentler’s (1968, 1971) clustran criterion, based on a proof due to 
Gibson (1962) , yields an orihonormal rotated factor loading matrix, B , 
fitting b in a least squares sense. If 

h = P6 1 ^ 2 Q' , (2) 

1/2 

then Gibson (1962) has proved that by removing the diagonal matrix, 6 , 

one obtains a matrix I, which will transform the original principal axis 
factor matrix. A, into B * 

T = PQ’ s (3) 

and 

B - AT . (4) 

Computationally, T may be obtained by first extracting eigenvalues and asso= 
elated eigenvectors from the minor product moment of h . The eigenvectors 
will correspond to Q . P may be obtained by premultiplying Q by h and 
then postmultip lying the result by the diagonal matrix comprising the recip- 
rocal square roots of the eigenvalues , 

From matrix T, component scores, Y , may be calculated for each set 
(cf , Kaiser, 1962), using the original principal axis factor matrix. A, its 

associated eigenvalues <5 , and the original standardized data matrix, Z , 

A 
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K=6 a T , 


(5) 


K/ 

Hi 


(6) 


Y = ZL 


(7) 



Because the component scores are based on an orthogonal normal transforma- 
tion of a principal axis factor matrix, the intercorrelation of these scores 
within each set will confirm their orthogonality 

I = Y ! Y . (8) 



Finally, component scores from each set are correlated to yield a matrix 
of correlations between the estimates of component scores derived from dif- 
ferent sets of tests 




Y'Y, 


Y'Y„ 


11 


1 2 


Y'Y, 


Y?Y« 


~2 1 


2 2 



(9) 



If the factors are listed in the same order within each battery, the diagonal 
of R^2 will contain the estimates of psychometric factor reliability. These 
estimates may be corrected by the Spearman-Brown formula, if an evaluation of 



the reliability of factors derivable from the entire set is to be made. This 



procedure may readily be generalized to any number of subsets of tests. Of 
course, as the number of subsets of tests increases, there may be increasing 
difficulty in defining, within each subset, tests of sufficient quality to 



define reliable factors* 



Wrigley and Neuhaus (Harman, 1967, pp. 271-272) have defined a coefficient 
of congruence for measuring the degree of factorial similarity between ,two 
sets of tests for the same sample of individuals. This is calculated by 
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dividing the sum of the crossproducts of factor scores of two factors by the 
geometric mean of the respective variances. If component scores in standard 

score form as in the present study are employed in the Wrigley and Neuhaus 
formula* their formula will give the identical results given by (9) . 

In a similar manner* a parallel analysis was undertaken on two sets of 
random data. Random binary digits were generated corresponding to each of 
560 items answered by 196 subjects and were scored using the same scoring 
keys employed on the real subjects. Scale scores were divided into two sub- 
sets of scales* the same sets employed previously. The identical factor 
analysis and computation of component scores was undertaken on these subsets* 
yielding two rotated factor loading matrices and two 196 by seven arrays of 
component scores for each set. These arrays* when intercorrelated , could be 
interpreted as the degree of stability manifested by the hypothetical sub* 
jects on two independently identified sets of seven latent dimensions. 

Although there is no reason in factor theory to suppose that random data of 
this type would yield evidence of stability across independent sets * even if 
these sets have been rotated to reflect the same factors* and from one point 
of view this demonstration is trivial* this sort of analysis nevertheless 
might serve to dispel any lingering doubts. There are* of course, a variety 
of other points at which randomness might have been introduced into the analy- 
sis, Rather than scores based on random binary digits, scores derived from 
random normal deviates might have been employed* for example* or real data 
might have been assigned randomly to subjects in one of the sets. It would 
not be a good use of time to evaluate these alternatives* which* indeed* 
would demonstrate only that factor scores based on separate sets of random 
data have a population correlation of zero. One suggested alternative which 

O 
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would clearly be inappropriate would be to use real data, but to base the 
variable scores on random keys- To the extent that the hypothesized general 
factors were present in the data, or to the extent that random keys tapped 
common factors, factor scores derived from such keys would not correlate zero. 
Such a procedure would merely be a further, although unsystematic, test of 
our hypotheses- In any case, the analysis of random data is an adjunct to 
the major analysis, which focuses on the reliability of factor scores de- 
rived from real data. Although possibly trivial and gratuitous, it does 
serve to emphasize the independence of analyses as between the two sets- 

Table 1 presents the scales and the hypothesis matrix. The scale labels 
are described in detail by Morf and Jackson (in press). For piesent purposes 
it is sufficient to note the following* (1) in the case of the four letter 

Insert Table 1 about here 

labels the first letter (E , P, S, U, or H) stands for the content reflected 
by the scale (E stands for Exhibition, P for Play, S for Suceorance, U for 
Understanding, and H for heterogeneous content), the second letter (A or S) 
stands for attitude item or self-descriptive format, the third (P or N) stands 
for positive or negative wording, and the fourth for true or false (T or F) 
keying; (2) that the first letter of the three letter labels (F) stands for 
scales consisting of California F Scale items, differing in wording (A stands 
for absolute wording, R for relative wording) and keying (T stands for true, 
and F for false keying); (3) that DA and DB stand for the parallel Desirability 
scales of Forms A and B of the Personality Research Form (Jackson, 1967), 
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Results 



Factor Analytic Results 

The factor loadings obtained in the factor analyses of the 23 scales 
comprising Set A are shown in Table 2, Those for the factor analyses of the 
23 scales of Set E are presented in Table 3. These results may be summarized 
for each set in a parallel manner. 

Factor I — Set A and Set B 



Clearly this factor is associated with the direction of keying of the 
scales. In both sets, the true and false keyed F scales have extreme loadings 
Factor II — Set A and Set B 



This factor is identified as an item endorsement factor * with the posi- 
tive pole marked by a tendency to endorse personality scale content ^ and the 
negative pole, to deny it. Forty-one of the 44 hypothesized loadings were 
in the expected direction for this factor. Only three small loadings for 
Set B were exceptions. 

Factor III 







Negatively - worded true keyed and positively - worded f al s e keyed scales 




Desirability (A) 



69 (Set A) 



Desirability (B) 



84 (Set B) 



This factor represents a tendency to respond desirably or undesirably 
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Factors IV . V, VI? VII 

Set A Set B 



Exhibition SFF 


.69 


Exhibition SPT 


,49 


Exhibition SNT 


.60 


Exhibition SNF 


.47 


Exhibition APT 


,61 


Exhibition APF 


.60 


Exhibition ANF 


.41 


Exhibition ANT 


,38 


Play SPT 


.65 


Play SPF 


i 

H 

O 


Play SNF 


.63 


Play SNT 


.64 


Play APF 


.41 


Play APT 


.63 


Play ANT 


.59 


Play ANF 


.53 


Sentience SPF 


.65 


Sentience SPT 


.55 


Sentience SNT 


.64 


Sentience SNF 


.47 


Sentience APT 


.50 


Sentience APF 


.60 


Sentience ANF 


.53 


Sentience ANT 


.63 


Understanding SPT 


.71 


Understanding SPF 


.70 


Understanding SNF 


.63 


Understanding SNT 


.53 


Understanding APF 


.59 


Understanding APT 


.48 


Understanding ANT 


.34 


Understanding ANF 


.59 


These factors clearly represent 


the four content dimensions represented 



by the respective scale names. 

Tables 2 and 3 present the orthogonal rotated factor loading matrix for 
Set A and for Set B, respectively. Each table first presents the real data 
analysis and then the random data analysis. 




14 



-13- 



Insert Tables 2 and 3 about here 



Evaluations of the goodness of fit of factors based on the random and 
real data solutions * The random and real data factor solutions can be com- 
pared with respect to a number of indices reflecting goodness of fit to vhe 

4 

target matrix. Figure 1 compares them in terms of two indices- The first. 

Insert Figure 1 about here 
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taking into account only the direction of relevant loadings, is most approp- 
riate for Factors I and II, which are defined by a large number of variables. 

The second, taking into account both the direction and relative size of rele- 
vant loadings, is more appropriate for the content factors defined by only 
four variables each. The criterion used here was whether or not the predicted 
loadings were the highest obtained for the factor. 

The Morf and Jackson (in press) study was designed to permit the emergence 
of Factors I (true responding) and II (item endorsement). Almost all vari- 
ables analyzed were , therefore, relevant to their definition in the sense 
that these factors were determined by almost all variables# As Table 1 shows, 
ill the present parallel analyses, 22 of the 23 variables were hypothesized to 
load in a specific direction on these two factors. Except for the near zero 
loadings of three variables on Factor II for Set B, all 88 relevant loadings 
were in the specified direction for the real data. In the case of the random 
data, however, 59 of these 88 loadings were in the no np re dieted direction. 

The single relevant variable for Factor III, the desirability scale, 
obtained the highest loading in the predicted direction in the two real, but 
not in the two random, data analyses. In the real data solutions, the four 

.i 



15 



7 - 

L 



-14- 



relevant variables defining each content factor obtained the largest loadings 
in the specified direction, while in the random set only 18 of these 32 vari- 
ables obtained loadings in the predicted direction and also exceeded the 
largest irrelevant loading. Thus, although it is difficult to quantify the 
degree of goodness of fit to the target matrix, these informal comparisons 
suggest that a clearly better fit was obtained for the real than for the 
random data* 

A chi square calculated on the random data solutions to test whether 
the surprisingly small numbers of loadings on Factors I and II in the specified 
direction deviated significantly from the numbers one would expect to load in 
the specified direction on the basis of chance proved to be significant at 
the .05 level. The fit to the target matrix of these two factors is thus some- 
what worse than one would expect on the basis of chance* This might be sur- 
prising at first glance, but becomes clearer when one recognizes that the 
clustran rotation procedure had fewer constraints operating in fitting the 
factors defined by few relevant variables, leaving itself very little leeway 
to fit the two factors defined by many variables* If this interpretation has 
merit, this finding has a bearing on the conclusions of Humphreys gt al * re- 
garding the critical role of the number of defining variables for a factor. 

Real and random data factor reliabilities * Although the real data solu- 
tions seem to fit the target matrix better than the random data solutions, 
these results alone do not establish the psychometric reliability of the fac- 
tors. In order to accomplish this, the component scores of each subject on 
each of the seven factors of the four solutions were computed as outlined in 
the previous section. Table 4 presents the correlations obtained between 
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Insert Table 4 about here 

these component scores on corresponding real data and random data factors. 
Also presented are factor reliabilities , obtained by applying the Spearman^ 
Brown formula to these correlation coef f icients * The real and random data 
are distinguished by these correlations considerably more clearly than by 
their respective fits to the target matrix* All correlations obtained for 
the real data are significant at the .0001 level, while none of those ob- 
tained for the random data are significant at the .05 level. 

Table 5 presents the upper half of the supermatrix of correlations 

Insert Table 5 about here 

between component scores in the two sets. The intercorrelations of the vari- 
ables for Set A form an identity matrix within the limits of rounding 
error. The same is true for the correlations of component scores within 
Set Bj which are not presented. Correlations between component scores based 
on corresponding factors from the two sets are presented in the right-h and 
section of Table 5. It will be noted that factor reliabilities, comprising 
the minor diagonal, are substantially higher than off-diagonal elements of 
the heteroset submat r Lx , 

The null hypothesis (cf* Block, 1971) that the Morf-Jackson factors are 
due to capitalization on chance in the rotation of axes may be rejected with 
a substantial degree of confidence. 
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Discussion 



The papers by Horn (1967) and by Humphreys et al. (1969) raised pro- 
found questions regarding the interpretation, of the results from factor 
analytic studies. They emphasized the point that the apparent meaningful- 
ness of a factor structure was no guarantee that it necessarily reflected 
true common factor variance among variables. A casual reading of the latter 
papers might suggest that one might- place little confidence in the results 
of factor analyses conducted on sample sizes of less than a very substantial 
number. The present investigation focuses on the problem of factor reli- 
ability, and suggests a means of interpreting the psychometric reliability 
of factors, by seeking evidence for stability in factor or component scores 
across independent sets of tests. The method proposed tends to emphasize 
the parameters identified by Humphreys et al. as crucial, namely , the ratio 
of the number of variables to the number of factors, and the number of sub- 
jects. Our method requires a sufficient number of variables to permit par- 
tition of the set of variables into two subsets, and a sufficient number of 
subjects to yield statistically significant psychometric factor reliabilities. 
If these parameters are satisfactorily large, a significance test may be 
undertaken on the consistency of factor scores, and a decision reached re- 
garding the probable psychometric significance of factors, A rejection of 
the null hypothesis under these circumstances would imply that the set of 
factors is replicable across distinct batteries of tests. Thus, it might 
be concluded that results are not due purely to chance, because, as our 
analysis of random data illustrated, there is no reason to expect data wholly 
lacking in psychometric reliability to show stability across separate subsets 

dfcs 
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of tests. Thus, chance effects, as uncovered by Humphreys and Horn, cannot 
operate to contribute to factor reliability as here defined. 

A finding of substantial psychometric factor reliabilities would permit 
the inference that the factors independently identified by separate batteries 
tended to reflect the same processes. These two inferences are different in 
the same sense that evidence for merely a statistically significant reli— 
ability might be differentiated from evidence for a substantial reliability 
for a given test* As with individual tests, one’s confidence in the psycho— 
metric reliability of a factor might be a linear function of the magnitude 
of the consistency of factor scores across samples of tests. Thus, statis- 
tically significant factor reliabilities might be considered a necessary 
but not a sufficient criterion for judging the adequacy of a factorial solu- 
tion. Because of the arbitrariness of rotation, some correlation might be 
expected between separate solutions so long as axes were oriented in such a 
way as to be mutually correlated to some degree* A high psychometric fac- 
tor reliability would imply that each set of tests defining the factor 
tends to reflect the factor univocally, and, furthermore, that the final 
rotated solution tends to identify independent samples of subjects’ scores 
along a common dimension. 

The rationale for psychometric reliability undertaken here can hardly 
be considered novel, at least in the context of classical univariate test 
theory. The notion that a set of items comprise a sample from a hypothetical 
universe has been incorporated in a number of formulations (see Bock & 

Wood, 1971, for a review), and implicitly, at least, seems to have been 
appreciated at least 60 years ago when Spearman (1910) and Browr. C1910) 
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published their classic articles on the effect of test length upon reli- 
ability. Curiously, this kind of thinking, while occasionally appearing in 
theoretical articles on factor analysis, has had almost no impact on the 
practice of factor analysis, where there is often a dearth of good reference 
tests, frequently insufficient to permit subsets each capable of defining a 
factor. A well-known kit of reference tests (French, Ekstrom, & Price, 1963) 
for example, lists only three basic tests per factor. Until the advent of 
the computer, analyses were sufficiently laborious to discourage parallel 
replication for the purpose of appraising psychometric generalization. 
Furthermore, only a minority of investigators (e.g., Horst, 1965) have fo- 
cused attention on the measurement of individuals based upon factor analy- 
sis— in most cases the factor loading matrix is of considerably more interest 
than the matrix of factor scores. But one would have little confidence in 
factor analytic results if measures based on one set of reference tests were 
wholly Independent of those based on a second set of putatively parallel 
tests. Furthermore, the previous objection of undue computational labor in 
employing factor scores is hardly relevant to Che present availability of 
modern, high-speed computing facilities. Certainly other approaches, such 
as those which might derive from intraclass correlation under various assump- 
tions, might represent viable alternatives to the one proposed here. 

It should be noted that our analysis made no attempt to focus on what 
Kaiser and Caffrey (1965) termed the statistical reliability of factors. The 
latter authors suggest that a completely general solution to the problem of 
factor reliability will probably take into account both the psychometric and 
the statistical reliability of factors, but note that this would be a rather 
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complex problem* The evidence to date suggests that evaluation of the statis- 
tical reliability of factor solutions will have to wrestle not only with the 
problems posed by Horn and Humphreys ef ad.. , but with more recent findings 
by Nesselroade and Baltes (1970) that attempts at factor matching based upon 
such optimal criteria as least squares provide a relatively satisfactory fit 
for random data* In addition , further work will have to be undertaken more 
clearly to define the concept of a subject population because, as has clearly 
been demonstrated (Tucker, 1966), the factorial structure describing dif- 
ferent types of subjects may vary both in terms of number and nature of the 
obtained factors, and in terms of factor correlations* 
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Summary and Conclusions 



1, The psychometric reliability of a factor, defined as its generaliz 
ability across the population of tests hypothesized to measure the factor, 
may be appraised empirically by correlating factor scores based on indepen= 
dently analyzed parallel subsets of tests* 

2* f Product-moment correlations so obtained may be tested for statist 
tical significance and may be corrected by the Spearman-Brown formula to 
yield an index of reliability, 

3, When psychometric factor reliability analysis was applied to fac- 
tor scores generated by tests hypothesized to reflect three response style 
and four content factors, all reliability coefficients were significant at 
the .0001 level, thus failing to support a conjecture made by Block that 
these factors were due to capitalization on chance. 

4. When a similar analysis was applied to random data, none of the 
psychometric factor reliabilities departed significantly from zero, 

5, Comparison of the results from real and random data analyses sup- 
ported the critical role of the number of tests defining each factor* 

6. An evaluation of the psychometric reliability of a factor should 
be undertaken routinely in factor studies, particularly in those employing 
rotation to optimize fit to a set of hypotheses* 
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Footnotes 

Reprint requests should be directed to Douglas N, Jackson, Department 
of Psychology, University of Western Ontario, London 72, Ontario, CANADA, 
The authors appreciate the helpful comments provided by Harry H. Harman, 
Walter Kristof, Ingram Qlkin, and Roger Pennell regarding the reporting of 
this study. Thanks are also due to William Krane, who assisted in generat- 
ing the random data- Supported in part by Research Grant No, 397 from the 
Ontario Mental Health Foundation, and a Special Research Fellowship from 
the National Institute of Mental Health, U,S. Public Health Service to 
Douglas N. Jackson and a Research Grant to Martin E, Morf from Canada 
Council . 

2 

This study was completed while Douglas N, Jackson was a Visiting 
Scholar, Division of Psychological Studies, Educational Testing Service, 
Princeton, New Jersey, 

3 

Although, strictly speaking, the rotation of axes does not occur in 
certain modern approaches to factor analysis, such as the one proposed by 
Joreskog (1969) , the analogous problem of the fitting of parameters on the 
basis of observed data remains, 

4 

Factor III, the Desirability Factor, with only a single hypothesized 
high loading, is not evaluated in Figure 1, 
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Table 1 
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0 


1 


0 


11, 


SAPT 


SAFE 


1 


-1 


1 


-1 


0 


0 
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Note: — A positive unity causes the rotation to • > ek to yield a positive loading 
for the variable in question, negative unity seeks t v- yield a negative loading and 
a zero leaves the loading for that variable unconstrained. 
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Table 2 (cont f d) 

Rotated Factor Loading Matrix for 23 Personality Variables: Set A 
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Note: Rotation for data on Tables 2 and 3 was by an orthogonal Procrustes criterion. 
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Table 4 

Factor Reliabilities for the Parallel Real Data Analysis 
and the Parallel Random Data Analyses 



Factor 


Real Data 
Uncorrected 


Analysis 

Corrected 


Randoni 

Data Analysis 


I 


m 

VJ 


.85 


04 


II 


.59 


.70 


04 


III 


.49 


.65 


04 


IV 


.50 


.66 


01 


V 


.56 


.72 


07 


VI 


.53 


.70 


-11 


VII 


.56 


.72 


12 



a 



Sp 



earman— Brown formula. 
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Correlations of Scores for Seven Factors from Each of 
Two Sets of Principal Components Factor Analyses. 
Minor Diagonals Are Factor Reliabilities 
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Comparison of factor analyses of real and random data in terms of the percentage 
of factor loadings in specified direction for acquiescence and content factors. 



